The GLYCAM condensed oligosaccharide notation

The condensed oligosaccharide notation is recognized by GLYCAM-Web when building via URL or by text entry online.  It is very similar to the “glycam” notation, but is condensed, and it is also based on common abbreviations given in Table 1 on the page describing the force field residue naming convention.

The Basics

The primary differences between condensed and glycam are:

  • The location of the alpha/beta indication
    • In condensed notation, this indication happens near the linkage, not at the beginning of the monosaccharide.
  • The quantity of punctuation
    • Condensed notation reduces the required number of characters.

For example,

glycam: a-D-Glcp-(1-4)-b-L-Manp-OH
condensed: DGlcpa1-4LManpb1-OH

Important things to note:

  • There can be no spaces in the notation.
  • You must specify the ring shape (pyranose or furanose).
  • You must specify the isomer (D or L).
  • The identity of the moiety at the reducing end must be specified explicitly (e.g., -OH or -OME).

For example, consider chitobiose and its methyl glycoside:

chitobiose: DGlcpb1-4DGlcpb1-OH
its methyl glycoside: DGlcpb1-4DGlcpb1-OME

Anomeric-Anomeric Linkages

In anomeric – to – anomeric likages (e.g., sucrose and trehalose), the notation is the same, but trailing linkage information is not appropriate.

trehalose: DGlcpa1-1DGlcpa
sucrose: DFrufb2-1DGlcpa

Branches

Branches are handled with square brackets.

Consider an alpha, L rhamnopyranoside with DManpa1-3DManpa1- attached at the three position and DGalpb1-4DGalpb1- attached at position four.  Expanded out to show branching, that would be:

DManpa1-3DManpa1-3-LRhapa1-OH
                       | 
      DGalpb1-4DGalpb1-4

The condensed sequence is:

DManpa1-3DManpa1-3[DGalpb1-4DGalpb1-4]LRhapa1-OH

The main chain

Notably, contrary to typical IUPAC convention, we do not set the main chain to be the longest chain.  Instead, the main chain is the chain that follows the lowest-numbered attachment points (attachment at O2 vs O3 or O6), when the structure is considered starting at the reducing end and going until there are no more residues along the path.

We do this because the biochemistry causes the longest chain to shift from structure to structure.  If we keep the main chain ordered by attachment number, then it is easier to compare structures at different points in a biochemical pathway.

This method of ordering also ensures that there is only one way to specify the sequence for any given glycan.

Consider the glycan in the image just below.  In this example, the main chain happens also to be the longest.  But, the ordering is based on the attachment point.

ordered_branch

The condensed sequence for this structure is:

DManpb1-2DManp[6A]b1-2[DGalpb1-6]DManpb1-2DManp[6Me]b1-2[DGlcpb1-6DManp[2S]a1-4]DGalpa1-OME

The main chain is highlighted in boldface.  Note that we have also drawn the structure so that the lowest numbered attachment point is always at the bottom.

To emphasize this point, consider this oligosaccharide:

ungrouped

The condensed sequence is:

DManpa1-3[DGalpb1-4DGalpb1-4DGalpb1-4DGalpb1-4]LRhapa1-OH

Again, the main chain is highlighted in bold.

L-sugars and SNFG notation

The iconic representations in the two figures above are derived from version 1 of the SNFG notation.  That versiion did not have a mechanism for specifying D- or L- sugars.  Because it is very important to note such differences in computational chemistry, we began using a dashed outline, as in the Rha (L-rhamnose) residue above, to denote an L- sugar.

A description of the current SNFG version has a mechanism for specifying L-sugars, but we still find the dashed outline easy to read in a structure.  The current SNFG notation is described in full here:  https://www.ncbi.nlm.nih.gov/glycans/snfg.html

Repeating Units

The notation is very picky about what is actually repeating  unit.  Consider this sequence:

DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-OH

The proper annotation is this:

DGlcpa1-[4DGlcpa1-]<9> OH

If you think about it, the first residue is different from the next nine in that it doesn’t have anything attached to it at its fourth position.  So, although there are ten glucopyranose residues, there are only nine that are identical.

Cycles

Oligosaccharides occasionally form cycles.  Here, the cycle of concern isn’t the ring in each pyranose or furanose monosaccharide.  Rather, we are concerned with sequences of monosaccharides that form a ring.

A common cycle of this sort is β-cyclodextrin, which is composed of seven DGlcpa monosaccharides, all linked 1-4 and forming a ring.

Cycles are handled with curly braces, the letter ‘c’.

One representation of β-cyclodextri is:

{c4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-4DGlcpa1-}

For this oligosaccharide, the annotation for a cycle and a repeat unit can be combined to make this simpler and more readable form:

{c[4DGlcpa1-]<7>}

Derivatives

Derivatives are handled in two ways.

One, if the derived monosaccharide is well known enough that it already has a common abbreviated form, we use that.  For example, 2-N-acetyl-β-D-glucosamine, commonly called β-D-GlcNAc, would be represented as:

DGlcpNAcb1-OH

Notice the ‘p’ that has been inserted between Glc and NAc.  Again, in computational chemistry we must be very specific: there is no chemical reason that GlcNAc couldn’t be in a furanose form.  So, it is necessary in this notation to specify.

For the majority of derived monosaccharides that do not have common names, the derivative is handled in a manner similar to a branch, with square brackets.  The differences are:

  • The brackets are placed between the abbreviated monosaccharide name, including the ring-shape designator (p or f), and the anomeric configuration designator (a or b).
  • Multiple derivatives go in the same brackets and are separated by commas.

For example, the GlcNAc above could be written as:

DGlcp[2NAc]b1-OH

If it were sulfated at position 6, it would be:

DGlcp[2NAc,6S]b1-OH

You could combine the common form above with the bracketed form, too, and get:

DGlcpNAc[2S]b1-OH

Consider this glycosaminoglycan (GAG):

HPS

GAGs are often highly sulfated.  The condensed notation for this would be:

DGalpNAc[4S,6S]b1-4DGlcpA[2S]b1-3DGlcpNSa1-4DGlcpA[2S]b1-4DGlcpNS[3S]a1-OME